{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Postgres Vector Store\n",
    "In this notebook we are going to show how to use [Postgresql](https://www.postgresql.org) and  [pgvector](https://github.com/pgvector/pgvector)  to perform vector searches in LlamaIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c2d1c538",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:14.461233Z",
     "start_time": "2023-06-08T13:06:14.451565Z"
    }
   },
   "outputs": [],
   "source": [
    "# import logging\n",
    "# import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import SimpleDirectoryReader, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import PGVectorStore\n",
    "import textwrap\n",
    "import openai"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "The first step is to configure the openai key. It will be used to created embeddings for the documents loaded into the index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "67b86621",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:16.584215Z",
     "start_time": "2023-06-08T13:06:16.576508Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<your key>\"\n",
    "openai.api_key = \"<your key>\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `paul_graham_essay` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "c154dd4b",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:19.009847Z",
     "start_time": "2023-06-08T13:06:19.001593Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: 4842475e-cf28-4b77-885d-9c2dfb5658f4 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e\n"
     ]
    }
   ],
   "source": [
    "documents = SimpleDirectoryReader(\"../data/paul_graham\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id, \"Document Hash:\", documents[0].doc_hash)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create the index\n",
    "Here we create an index backed by Postgres using the documents loaded previously. PGVectorStore takes a few arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8731da62",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:23.856002Z",
     "start_time": "2023-06-08T13:06:20.486349Z"
    }
   },
   "outputs": [],
   "source": [
    "vector_store = PGVectorStore.from_params(\n",
    "    database=\"vector_db\",\n",
    "    host=\"localhost\",\n",
    "    password=\"\",\n",
    "    port=5432,\n",
    "    user=\"postgres\",\n",
    "    table_name=\"paul_graham_essay\",\n",
    ")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "We can now ask questions using our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0a2bcc07",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:29.486725Z",
     "start_time": "2023-06-08T13:06:27.565039Z"
    }
   },
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "8cf55bf7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:32.686121Z",
     "start_time": "2023-06-08T13:06:32.680098Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:41.307336Z",
     "start_time": "2023-06-08T13:06:34.185641Z"
    }
   },
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "fdf5287f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:06:44.174719Z",
     "start_time": "2023-06-08T13:06:44.163087Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author grew up writing essays on topics they had stacked up, thinking about what to work on,\n",
      "and exploring the implications of the internet. They also studied Italian, explored Florence,\n",
      "painted people and still lifes, and worked on software projects. They wrote essays for others to\n",
      "read and published them online. They also had dinners with friends and bought a building in\n",
      "Cambridge.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### Querying existing index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:07:11.849139Z",
     "start_time": "2023-06-08T13:07:09.040607Z"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from llama_index import load_index_from_storage\n",
    "\n",
    "vector_store = PGVectorStore.from_params(\n",
    "    database=\"vector_db\",\n",
    "    host=\"localhost\",\n",
    "    password=\"\",\n",
    "    port=5432,\n",
    "    user=\"postgres\",\n",
    "    table_name=\"paul_graham_essay\",\n",
    ")\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-08T13:07:19.217191Z",
     "start_time": "2023-06-08T13:07:19.204408Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
